Tagging Icelandic Text using a Linguistic and a Statistical Tagger
نویسنده
چکیده
We describe our linguistic rule-based tagger IceTagger, and compare its tagging accuracy to the TnT tagger, a state-of-theart statistical tagger, when tagging Icelandic, a morphologically complex language. Evaluation shows that the average tagging accuracy is 91.54% and 90.44%, obtained by IceTagger and TnT, respectively. When tag profile gaps in the lexicon, used by the TnT tagger, are filled with tags produced by our morphological analyser IceMorphy, TnT’s tagging accuracy increases to 91.18%.
منابع مشابه
Improving the PoS tagging accuracy of Icelandic text
Previous work on part-of-speech (PoS) tagging Icelandic has shown that the morphological complexity of the language poses considerable difficulties for PoS taggers. In this paper, we increase the tagging accuracy of Icelandic text by using two methods. First, we present a new tagger, by integrating an HMM tagger into a linguistic rule-based tagger. Our tagger obtains state-of-the-art tagging ac...
متن کاملIcelandic Data Driven Part of Speech Tagging
Data driven POS tagging has achieved good performance for English, but can still lag behind linguistic rule based taggers for morphologically complex languages, such as Icelandic. We extend a statistical tagger to handle fine grained tagsets and improve over the best Icelandic POS tagger. Additionally, we develop a case tagger for non-local case and gender decisions. An error analysis of our sy...
متن کاملTagging a Morphologically Complex Language Using an Averaged Perceptron Tagger: The Case of Icelandic
In this paper, we experiment with using Stagger, an open-source implementation of an Averaged Perceptron tagger, to tag Icelandic, a morphologically complex language. By adding languagespecific linguistic features and using IceMorphy, an unknown word guesser, we obtain stateof-the-art tagging accuracy of 92.82%. Furthermore, by adding data from a morphological database, and word embeddings indu...
متن کاملFurther Results and Analysis of Icelandic Part of Speech Tagging
Data driven POS tagging has achieved good performance for English, but can still lag behind linguistic rule based taggers for morphologically complex languages, such as Icelandic. We extend a statistical tagger to handle fine grained tagsets and improve over the best Icelandic POS tagger. Additionally, we develop a case tagger for non-local case and gender decisions. An error analysis of our sy...
متن کاملTesting Data-Driven Learning Algorithms for PoS Tagging of Icelandic
This paper gives the results of an experiment concerned with training three different taggers on tagged Icelandic text. The taggers fnTBL, TnT and MXPOST were trained on the corpus of the Icelandic Frequency Dictionary that contains over 500 thousand running words that have been tagged with morphological tags. The tagset contains over 600 tags. Different methods for tagger combination were also...
متن کامل